🚀 Pipeline Results Report

Generated: /root/verbalize-sampling

Experiments: 4

Metrics: diversity, ngram, joke_quality

📋 Experiment Configurations

NameTaskMethodModelResponsesTemperature
direct (samples=1)jokedirectanthropic/claude-4-sonnet300.7
sequence [strict] (samples=5)jokesequenceanthropic/claude-4-sonnet300.7
multi_turn [strict] (samples=5)jokemulti_turnanthropic/claude-4-sonnet300.7
structure_with_prob [strict] (samples=5)jokestructure_with_probanthropic/claude-4-sonnet300.7

📝 Generation Examples

direct (samples=1)

Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!

sequence [strict] (samples=5)

Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

multi_turn [strict] (samples=5)

Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
Why did the scarecrow win an award?

Because he was outstanding in his field!
Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
Why don't skeletons fight each other?

They don't have the guts!

structure_with_prob [strict] (samples=5)

Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

📊 Results Summary

Diversity Results

Note: Detailed arrays excluded from table: pairwise_diversities
ExperimentAverage DiversityMin DiversityMax DiversityStd DiversityAverage Response LengthAverage Unique WordsAverage Vocabulary RichnessTotal Cost
direct (samples=1)0.03270.00000.24040.072413.226713.16670.99650.0000
sequence [strict] (samples=5)0.29920.00000.48960.134514.546714.04670.97540.0001
multi_turn [strict] (samples=5)0.24720.00000.47590.130614.960014.57330.98350.0001
structure_with_prob [strict] (samples=5)0.29730.00000.49580.141513.340012.88000.97470.0001

📈 Diversity Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Diversities Distribution
Pairwise Diversities Distribution
Response Length Distribution
Response Length Distribution
Unique Words Distribution
Unique Words Distribution
Vocabulary Richness Distribution
Vocabulary Richness Distribution

Ngram Results

Note: Detailed arrays excluded from table: pairwise_rouge_l_scores
ExperimentAverage Rouge LMin Rouge LMax Rouge LStd Rouge LAverage Response Length
direct (samples=1)0.86920.13331.00000.267713.2267
sequence [strict] (samples=5)0.18170.00001.00000.305114.5467
multi_turn [strict] (samples=5)0.29430.00001.00000.313914.9600
structure_with_prob [strict] (samples=5)0.19020.00001.00000.336213.3400

📈 Ngram Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Rouge L Scores Distribution
Pairwise Rouge L Scores Distribution
Response Length Distribution
Response Length Distribution

Joke_Quality Results

ExperimentFunninessClevernessOriginalityStructureOverallNormalized Overall
direct (samples=1)3.79333.88002.58004.26673.64330.7287
sequence [strict] (samples=5)3.57333.88672.58004.18003.54400.7088
multi_turn [strict] (samples=5)3.39333.78002.57334.13333.43700.6874
structure_with_prob [strict] (samples=5)3.47333.80002.40674.21333.45270.6905

📈 Joke Quality Visualizations

Aggregate Metrics
Aggregate Metrics
Cleverness.Score Distribution
Cleverness.Score Distribution
Funniness.Score Distribution
Funniness.Score Distribution
Originality.Score Distribution
Originality.Score Distribution
Structure.Score Distribution
Structure.Score Distribution

📈 Overall Comparison

📈 Comparison Chart Visualizations

Comparison Chart
Comparison Chart